Non-parametric Bayesian approach to post-translational modification refinement of predictions from tandem mass spectrometry

نویسندگان

  • Clement Chung
  • Andrew Emili
  • Brendan J. Frey
چکیده

MOTIVATION Tandem mass spectrometry (MS/MS) is a dominant approach for large-scale high-throughput post-translational modification (PTM) profiling. Although current state-of-the-art blind PTM spectral analysis algorithms can predict thousands of modified peptides (PTM predictions) in an MS/MS experiment, a significant percentage of these predictions have inaccurate modification mass estimates and false modification site assignments. This problem can be addressed by post-processing the PTM predictions with a PTM refinement algorithm. We developed a novel PTM refinement algorithm, iPTMClust, which extends a recently introduced PTM refinement algorithm PTMClust and uses a non-parametric Bayesian model to better account for uncertainties in the quantity and identity of PTMs in the input data. The use of this new modeling approach enables iPTMClust to provide a confidence score per modification site that allows fine-tuning and interpreting resulting PTM predictions. RESULTS The primary goal behind iPTMClust is to improve the quality of the PTM predictions. First, to demonstrate that iPTMClust produces sensible and accurate cluster assignments, we compare it with k-means clustering, mixtures of Gaussians (MOG) and PTMClust on a synthetically generated PTM dataset. Second, in two separate benchmark experiments using PTM data taken from a phosphopeptide and a yeast proteome study, we show that iPTMClust outperforms state-of-the-art PTM prediction and refinement algorithms, including PTMClust. Finally, we illustrate the general applicability of our new approach on a set of human chromatin protein complex data, where we are able to identify putative novel modified peptides and modification sites that may be involved in the formation and regulation of protein complexes. Our method facilitates accurate PTM profiling, which is an important step in understanding the mechanisms behind many biological processes and should be an integral part of any proteomic study. AVAILABILITY Our algorithm is implemented in Java and is freely available for academic use from http://genes.toronto.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational refinement of post-translational modifications predicted from tandem mass spectrometry

MOTIVATION A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale...

متن کامل

Bayesian false discovery rates for post-translational modification proteomics

Tandem mass spectrometry-based proteomics enables high throughput analysis of post-translational modifications (PTMs) on proteins. In current researches of shotgun proteomics, peptides with various PTMs and those without PTMs are often identified together and an overall false discovery rate (FDR) is estimated. However, it is often the case that only a subset of identifications, e.g. those with ...

متن کامل

Protein Post-translational Modifications Mapping with MS/MS based Frequent Interval Pattern Mining

Tandem mass spectrometry (MS/MS)-based proteomics has demonstrated to be an indispensable tool for large scale protein identification and expression profiling tasks. However, analysis of protein post-translational modifications (PTMs) with MS/MS still presents formidable challenges. Although some heuristic algorithms have been developed for this problem, they are far from mature. In this paper,...

متن کامل

A New Hybrid De Novo Sequencing Method For Protein Identification

Tandem mass spectrometry is a powerful tool for studying proteins. However, an open problem for proteomics research is how to accurately identify proteins from the experimental mass spectra. De novo sequencing based protein identification is the only feasible approach for finding new proteins and studying protein post-translational modifications. In this paper, we describe our novel hybrid de n...

متن کامل

Mapping sites of O-GlcNAc modification using affinity tags for serine and threonine post-translational modifications.

Identifying sites of post-translational modifications on proteins is a major challenge in proteomics. O-Linked beta-N-acetylglucosamine (O-GlcNAc) is a dynamic nucleocytoplasmic modification more analogous to phosphorylation than to classical complex O-glycosylation. We describe a mass spectrometry-based method for the identification of sites modified by O-GlcNAc that relies on mild beta-elimin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 29 7  شماره 

صفحات  -

تاریخ انتشار 2013